This assignment is for ETC5521 Assignment 1 by Team brolga comprising of Hanchen Wang, Jiaying Zhang, Ziyao Wang (Billy), and Arthur Andersen Widjaya.

1 Introduction and motivation

It has been 45 years since the first landline telephone was invented in 1876. The telephone has changed a lot from how it works to how it looks. Today, mobile technology has spread rapidly around the globe and it is estimated that more than 5 billion people have mobile devices, and over half of these connections are smartphones.

“Digital connectivity plays a critical role in bettering lives, as it opens the door to unprecedented knowledge, employment and financial opportunities for billions of people worldwide,” said ITU Secretary-General Houlin Zhao. Because the telephone plays a crucial role in our life, we think it is valuable to analyze it.

This analysis is based on information about mobile and landline phones collected by the ITU (2021) -International Telecommunication Union and would like to analyze the phone subscription trend, percentage and other factors.

Specially:

1.What is the trend in subscription of phone and landline worldwide between 1990 and 2017?

2.What was the growth rate of the subscription of mobile phones in each country from 2013 to 2017 and what was the proportion of the subscription of phone and landline from 2000?

3.What is the regression relationship between phone subscriptions and other variables?

2 Data description

The datasets are downloaded from the Github repository of Tidy Tuesday. Tidy Tuesday (2021) is a weekly data project aimed at the R ecosystem and this report will use datasets adopted by it on November 10, 2020.

There are two datasets provided in the repository and all of them came from an article-“Technology Adoption” which written by Ritchie and Roser (2017) on 2017 OurWorldInData.org.

Tidy Tuesday (2021) only made a few changes to the original data and already can get relatively clean data that can be used for analysis, mainly by matching the time of several original data, screening the data from 1990 to 2017, and adding the corresponding continent of each country.

2.1 Fixed(landline) telephone subscriptions vs GDP per capita (landline.csv)

The dataset about Fixed(landline) telephone subscriptions vs GDP per capita also originated from “Technology Adoption.” It published by (Worldbank 2021 – World Development Indicators)(http://data.worldbank.org/data-catalog/world-development-indicators) and collected by (ITU 2021 - International Telecommunication Union)(https://www.itu.int/en/ITU-D/Statistics/Pages/publications/wtid.aspx). The dataset cover the data about fixed telephone subscriptions and GPD per capita in each country during 1960 and 2017.

Fixed telephone subscriptions refers to the sum of active number of analogue fixed telephone lines, voice-over-IP (VoIP) subscriptions, fixed wireless local loop (WLL) subscriptions, ISDN voice-channel equivalents and fixed public payphones.

2.1.1 Structure of landline.csv

This dataset has 6974 number of observations and 7 number of variables. The name, type and description of each variable in landline.csv can be found in the data dictionary below.

variable class description
entity character Country
code character Country code
year double Year
total_pop double Gapminder total population
gdp_per_cap double GDP per capita, PPP (constant 2011 international $)
landline_subs double Fixed telephone subscriptions (per 100 people)
continent character Continent

2.1.2 Collection methods

Data on fixed telephone lines are derived using administrative data that countries (usually the regulatory telecommunication authority or the Ministry in charge of telecommunications) regularly, and at least annually, collect from telecommunications operators.

Data for this indicator are readily available for approximately 90 percent of countries, either through ITU’s World Telecommunication Indicators questionnaires or from official information available on the Ministry or Regulator’s website. For the rest, information can be aggregated through operators’ data (mainly through annual reports) and complemented by market research reports.

2.1.3 Data Limitation

  • Discrepancies between global and national figures may arise when countries use a different definition than the one used by ITU.Data are usually not adjusted but discrepancies in the definition, reference year or the break in comparability in between years are noted in a data note. For this reason, data are not always strictly comparable. Missing values are estimated by ITU.
Visualise the missing value in landline data

Figure 2.1: Visualise the missing value in landline data

  • From Figure 2.1, we can see that there are 4 variables that have missing values. Among them, there are many null values in gdp_per_cap and landline_subs, which may have come influence on the results of statistical analysis.
  • This data only covers the period from 1990 to 2017. If researchers want to keep up to date with the last few years, they need to look at other data as well.

2.2 Mobilephone subscriptions vs GDP per capita (mobile.csv)

The dataset about Mobilephone subscriptions vs GDP per capita originated from an article-“Technology Adoption.” This dataset was published by (Worldbank 2021 – World Development Indicators)(http://data.worldbank.org/data-catalog/world-development-indicators) and collected by (ITU 2021 - International Telecommunication Union)(https://www.itu.int/en/ITU-D/Statistics/Pages/publications/wtid.aspx). The dataset cover the data about mobilephone subscriptions and GPD per capita in each country from 1960 to 2017.

2.2.1 Structure of mobile.csv

This dataset has 6277 number of observations and 7 number of variables. The name, type and description of each variable in mobile.csv can be found in the data dictionary below.

variable class description
entity character Country
code character Country code
year double Year
total_pop double Gapminder total population
gdp_per_cap double GDP per capita, PPP (constant 2011 international $)
mobile_subs double Fixed mobile subscriptions (per 100 people)
continent character Continent

Mobile cellular telephone subscriptions are subscriptions to a public mobile telephone service that provide access to the PSTN using cellular technology. The indicator includes (and is split into) the number of postpaid subscriptions, and the number of active prepaid accounts (i.e. that have been used during the last three months). The indicator applies to all mobile cellular subscriptions that offer voice communications. It excludes subscriptions via data cards or USB modems, subscriptions to public mobile data services, private trunked mobile radio, telepoint, radio paging and telemetry services.

2.2.2 Collection methods

Data on mobile cellular subscribers are derived using administrative data that countries (usually the regulatory telecommunication authority or the Ministry in charge of telecommunications) regularly, and at least annually, collect from telecommunications operators.

Data for this indicator are readily available for approximately 90 percent of countries, either through ITU’s World Telecommunication Indicators questionnaires or from official information available on the Ministry or Regulator’s website. For the rest, information can be aggregated through operators’ data (mainly through annual reports) and complemented by market research reports.

2.2.3 Data Limitation

  • Discrepancies between global and national figures may arise when countries use a different definition than the one used by ITU. Data are usually not adjusted but discrepancies in the definition, reference year or the break in comparability in between years are noted in a data note. For this reason, data are not always strictly comparable. Missing values are estimated by ITU.
Visualise the missing value in mobile data

Figure 2.2: Visualise the missing value in mobile data

  • From Figure 2.2, we can see that there are 4 variables that have missing values. Among them, there are many null values in gdp_per_cap , total_popand mocile_subs, which may have some influence on the results of statistical analysis.
  • This data only covers the period from 1990 to 2017. If researchers want to keep up to date with the last few years, they need to look at other data as well.

3 Data Analysis (EDA) - 1

3.1 What is the trend in subscription of phone and landline worldwide between 1990 and 2017?

Figure 3.1: The trend of mobile phone and landline subscription by continent from 1990 to 2017

Figure 3.1 shows the basic trends of subscriptions of mobile phone and landline subscriptions from 1990 to 2017 by 5 continents, I combined the two datasets and use some functions like pivot_longer() to adjust the variables as well as plotted the line graph by ggplotly which turns out a better interactive data visualization, but Among them, I grouped the data according to continents rather than countries, because I want to present a clear and intuitive trend figure so that the readers can roughly understand the basic facts reflected by the two datasets. More complex issues between countries even involve GDP per capita and the explanation will be described in the following questions.

From the combined figure, we can find the similarities and differences of trends easily, that Europe has the largest share of subscriptions of both devices over the 27 years, followed by the Americas, Asia and Oceania, and Africa has the fewest subscriptions compared to other continents. As is all known, the development of communication equipment cannot be separated from the level of technological and economic development of a country or region. Europe, as the main market of landline and mobile phones in the second half of the 20th century, naturally occupied a high market share and subscription volume. Although the earliest landline machine was invented in America in Canada, The economic conditions and social level of the countries except for the USA in America determine that the subscriptions are difficult to go beyond Europe, and Oceania is in the same way. However, in Asia and Africa, which are mainly developing countries, such conditions cause the subscription volume of both devices can only lag.

As for the difference, Although mobile phone subscriptions in the five continents were virtually zero in 1990, they exploded exponentially over the next decade. In contrast, landline subscriptions remained at this high level relatively in 1990 in all regions except Africa, but the growth rate was significantly lower, and there was even a decline in Europe. Meanwhile, Africa has remained roughly at the bottom. Overall, landline subscriptions in all regions have remained slow or stagnant for 17 years. the interesting findings can reflect that mobile communication equipment is gradually replacing the traditional landline communication equipment. And such an inevitable trend also reflects the rapid development of human communication technology in the past 20 years.

3.2 What was the growth rate of the subscription of mobile phones in each country from 2013 to 2017 and what was the proportion of the subscription of phone and landline from 2000?

3.2.1 What was the growth rate of the subscription of mobile phones in each country from 2013 to 2017?

Figure 3.2: Growth rate of the mobile subscription in each country during 2013 and 2017

(Put mouse on each country you can see the detail about country names and the mobile phone subscription 3 years growth rate of them.)

From the figure 3.2 we can see the increase in mobile phone subscriptions in recent years. In the three years from 2015 to 2017, the number of mobile phone subscriptions was not very large on average, staying at around 0.

There are even countries like France, Russia and Mexico that have a negative growth rate. In three years the subscription of mobile phones in Libreria, decreased by 44%, which is extremely fast. However, it’s worth noting that the number of mobile subscriptions in Somania and Burma grew by 73% over the same period.

Instead of seeing a rise in the number of mobile phones in developed countries, we see a partial decline. Perhaps as mentioned by Laura Silver (2021), the developed economies have already completed the popularization of mobile phones in the early stage. Now, compared with the increase in the number, the main trend is that smartphones replace ordinary mobile phones.

3.2.2 What was the proportion of the subscription of phone and landline from 2000?

Compare proportion of the subscription of phone and landline from 2000 to 2017

Figure 3.3: Compare proportion of the subscription of phone and landline from 2000 to 2017

As we can see from the animation 3.3, the main trend since 2000 is that the number of fixed-line phones has been decreasing while the number of mobile phones has been increasing.

European countries have more fixed-line phones, and more people still use them recently. Most African countries, by contrast, rarely use landlines.

This may be because people in European countries used to have landlines, so even as mobile phones became more popular, they still had a large proportion of people using them. However, African countries rarely used landline telephone from the beginning, so with the advent of mobile phones, they did not have the process of a landline to the mobile phone, and chose mobile phone directly.

Figure 3.4: Proportion of the subscription of phone and landline in 2000 and 2017

Picture 3.4 showed us more detailed information, compares the 2000 and 2017 of fixed telephone and mobile phone subscriptions per 100 people. We can see that in most countries the number of fixed phone fell sharply, only less than 50, on the contrary, there are quite a few people have more mobile phone at the same time. On average, most people own a mobile phone, and 100 people have an average of less than 25 landline phones.

This trend did not happen recently, in 2006 “Cell Phone Users Are Giving up Their Landline Service” (2006) already said that one in five consumers who currently use wireless telephone service plan to drop their landline service, according to In-Stat. Besides, it’s worth noting that Monaco has a far higher percentage of fixed-line phones than any other country, while Hong Kong has nearly 2.5 mobile phones per person.

3.3 What is the regression relationship between phone subscriptions and other variables?

The spread of mobile phone devices can be shown as the number of people who use mobile phones. Since the modern lifestyle people are more and more connected to the usage of mobile phone, which has a significant difference from last 10 or 20 years.

According to the first question, it found that the penetration rate of mobile phones has been increasing year by year, so it is meaningful to look at what factors are related to mobile phone subscriptions and find out any possible relationships associated with it. From the experience, it suggests the mobile phone subscription may have an association with the time factor and GDP per capita. Thus, using these two as the independent variables for the test fitting function may be helpful, and applying this function to all the countries to see if it is suitable to predict for the whole data set.

3.3.1 What is the regression relationship between phone subscriptions and year?

Figure 3.5: The fitted relationship of mobile phone subscription with year gap from 1990 to 2017

Firstly, set up the fitting linear function: mobile_subs ~ year1990 + gdp_per_cap, which year1990 can be regarded as the difference between each year and 1990. In this case, this year gap becomes a value from 1 to 27 with 1990 as the base year, which can better judge the participation of the “year” variable in the model. Since I wanted to make a fitting model for each city, I had to use the nest() function and map function to fit lm to all countries and map the function to data column of tidy data, and use augment() to plot all the models, which is indeed a complicated process.

As shown in Figure 3.5, it looks very complicated. Yet, it still shows a similar pattern of the mobile phone subscriptions in question 1, which may suggest the fitting relationship between mobile phone subscriptions and the year gap in each country is very similar. Interestingly, Macao and Hong Kong as the Asian regions have a higher fitted value in 2017. According to the data available, it is true that these two regions have had very high subscriptions in recent years, but does this mean that they have a fitting model with a larger slope? Similarly, another Asian country, Myanmar has the lowest fitted value in 2017, so does it has a smaller slope?

Figure 3.6: The fitted relationship of mobile phone subscription with year gap from 1990 to 2017

To check the slope and intercept of the model for each country, I use the augment()to plot all the models. As shown in Figure 3.6, it demonstrates the intercept and year slope of the model of each country.

For example, the tidy model of Australia is as following: \[\widehat{mobile-subs} = -274.173053 - 1.2974~year1990 + 29.3258~gdp-per-cap\]

Here are some interesting findings:

  • Generally, more than two-thirds of countries have a positive slope, which means most countries tend to have a constant increase in subscriptions.

  • There is a difference across the continents: Countries in Europe and Oceania tended to start with a more positive slope, which means a positively increasing rate of subscription; countries in Asia tended to start lower but have high rates of improvement; Africa tends to start lower and has a huge range in the rate of change and most countries who have negative growth.

  • Sure enough, Macao and Hongkong have the higher slope relatively as well as Myanmar had the lower slope.

  • Seems most countries had a negative intercept, which was supposed to be the two possible reasons, one is that all the countries did have nearly zero mobile phone subscriptions in 1990, second is that the model was affected by another factor: GDP per capita.

3.3.2 What is the regression relationship between phone subscriptions and GDP per capita?

Figure 3.7: q3_gdp_lm_fig

Finally, let’s focus on the model for GDP per capita. This time, to avoid the interference of the year on the model graphs, it can be done by using facet_wrap(~year) to separate each graph.

As shown in Figure 3.7, the relationship between GDP per capita and mobile phone subscriptions in this model can be intuitively reflected. It is not difficult to find that the slope of each continent between 1990 and 1997 was almost zero. But after 1997, all of them have a positive slope which shows that GDP per capita and mobile phone subscription volume have a positive correlation.

This may cause by the relationship that a higher GDP per capita will have greater purchasing power. Often people with higher purchasing power can afford high-tech products. In the last century, Europe, as a typical high-income region, naturally had a higher GDP per capita and therefore a higher telephone subscription volume, which in turn had a greater growth rate than other regions.

It maintained a relatively high slope growth rate until 2007, and its slope was overtaken by Asia, and then the growth rate declined slowly until 2017. This also reflects that with the change of the times and the development of global technology, the cost of mobile phone products has become cheaper and more accessible to people. The number of subscriptions is essentially affected by additional mobile phone accounts and call charges. The rise of Asian mobile phone subscriptions, maybe due to the rapid development of the associated infrastructures, manufacture and service of the mobile phone industries. To ensure small profits but quick turnover, many technology companies are willing to produce cheaper mobile phone products with more affordable costs and services to attract more consumers.

Figure 3.8: The fitted relationship of mobile phone subscription with year gap from 1990 to 2017

The 3.7 shows the coefficient of the model for GDP per capita and phone subscription. It shows that most countries in Africa have a higher slope and negative intercept, that is because the economic development of many African countries is relatively slow and unbalanced. In the initial stage, mobile phone subscriptions remain at a very low level, but there are still some countries with a relatively higher GDP per capita such as South Africa that can achieve a higher mobile phone subscription rate than others.

3.3.3 The applicability of the fitted model

Table 3.1: Measures of Goodness of Fit
continent r.squared adj.r.squared AIC BIC
Africa 0.8316 0.8150 173.7 178.3
Americas 0.8943 0.8838 193.5 198.1
Asia 0.8984 0.8883 181.4 186.0
Europe 0.9488 0.9420 168.8 173.1
Oceania 0.8302 0.8130 144.6 148.9

Lastly, let’s have a look at the compatibility of the prediction model. Table 3.1 uses glance() from the previously fitted model to extract the information. The values of r.squared are large, with a relatively acceptable level of AIC and BIC values which are used to penalize the artificial increase of r.squared values. This may suggest that this prediction model may be reasonable to use for the prediction on mobile phone subscriptions with careful further examinations.

The inapplicable situation

Figure 3.9: The inapplicable situation

Figure 3.10: The inapplicable entities

According to the \(R^2\) values, figure fig 3.9 will show a histogram to illustrate the information. It suggests some countries may not have a positive linear relationship between mobile phone subscriptions and GDP per capita and time factor.

Additionally, constructing scatter plots with a linear fitting on those with low \(R^2\) values countries (i.e. filtering the countries with \(R^2<0.65\)), to check if the relationship is not linear in these countries. From figure fig 3.10, four countries are detected. For the majority of the countries, the prediction model fits acceptably well, yet the cause of incompatibility for these four countries may be due to the incompetence of the data set or inaccurate records on the mobile phone subscriptions 15 years ago or maybe other underlying factors.

3.4 Summary of findings

In conclusion, from the above analysis, we can see that the number of mobile phones has been growing rapidly since 1990, while the number of fixed phones has also more than that in 1990, but it shows a decline recently.

Overall, European countries still have more landlines than other regions, we guess it is because European countries have the habit of using landlines. Recently, the growth rate of mobile phones varies from country to country. Some countries are increasing rapidly, but some countries have seen negative growth. We guess recently in some countries the trend has already turn to switch from regular mobile phones to smartphones. At the same time, Monaco’s significantly higher rate of fixed-line phone ownership and Hong Kong’s average rate of two mobile phones per person also caught our attention.

In addition, through the analysis of if there is a relationship between per capita GDP and mobile phone ownership rate in each country, we found that the higher the per capita GDP is, the higher the growth rate of mobile phone subscription is. We also look into the trend in different regions and guess the possible reason.

Data communication is indispensable in our daily life, so it is interesting and worthwhile to analyze and explore this data. But we still believe that if combining the current data with smartphone subscription rate, age or education level we can get more interesting findings.

4 Data Analysis (EDA) - 2

!!! Need adjustments on the original introduction section!!!

4.1 New additional research questions:

  1. When is the transformation of communication methods (i.e. mobile and landline phones)?
  2. What factors that may have influences on the change of mobile phone subscriptions?

4.2 Model Formation - mobile subscriptions rate & land-line subscriptions rate.

The two main interests points for constructing the possible fitting model are the variations of mobile phone subscriptions and land-line/telephone subscriptions.

4.2.1 A scatter Plot Matrix

The scatter plot matrix will test if any associations between both mobile phone and telephone subscription and other variables.

Scatter plot matrix

Figure 4.1: Scatter plot matrix

From this matrix 4.1, there are observations:

  • For mobile phone subscription, it shows the variation in mobile phone subscription has a relatively strong correlation with the “year1999” variable and a moderate positive correlation with “GDP_per_capita.”

  • For landline subscription, it suggests the variation of landline subscription has a moderate positive correlation with “gdp_per_capita.” Interestingly, although, the variable “total_population” has a very weak negative correlation with landline subscription’s variation, yet it is suggested as a significant factor.

  • Correlation is not a causation!

4.2.2 Attempt to fit the linear regression and Local regression models

The parametric regression method - use of lm.

Firstly, it will pick the variable that has the greatest association (i.e. the highest correlation) relative to mobile phone subscription or landline subscription.

  • Fitting the linear regression for mobile phone subscription using the “year1990” variable.
  • Fitting the linear regression for landline subscription using the “gdp_per_cap” variable.

Figure 4.2: Fitting linear regression for mobile phone subsripitions in each continent

From these plots 4.2, it illustrates the “year” variable has a certain level of effect on explaining the variation in mobile subscription, yet there are more or less non-linear patterns in the figure. This may indicate the need for additional information to obtain a better regression model.

Furthermore, besides the “year” variable, the “gdp per capita” also is an important factor for changes in mobile subscription in this data. Thus, here has constructed a linear regression model for mobile phone subscription:

\[\widehat{mobile-subs} = -33.5686 + 4.8791~year1990 + 0.0009~gdp-per-cap\]

## # A tibble: 3 x 5
##   term          estimate std.error statistic p.value
##   <chr>            <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept) -33.6      0.810         -41.5  7e-323
## 2 year1990      4.88     0.0491         99.3  0     
## 3 gdp_per_cap   0.000915 0.0000205      44.7  0
## # A tibble: 1 x 12
##   r.squared adj.r.squared sigma statistic p.value    df  logLik    AIC    BIC
##       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>   <dbl>  <dbl>  <dbl>
## 1     0.727         0.727  27.3     6611.       0     2 -23426. 46860. 46886.
## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Above is the statistical information summary for this linear regression model. The value of R square for this linear model is 0.7274, which suggests that around 72.74% of the variation in mobile phone subscription can be explained using variables “year 1990” and “gdp_per_cap.”

The possible inference example: in the same year after 1990, the change in GDP per capita for one unit will have a positive correlation on a variation on mobile phone subscription rate by 0.0009 on average.

However, this linear regression model has an obvious error, which is a great value of negative intercept. The intercept suggests the mobile phone subscription rate will turn positive after 1997, yet before that, the subscription rate will be negative value even with consideration of the effect of “year1990” and “gdp_per_cap.” Therefore, the linear regression model is not suitable for predicting or making insightful inferences on mobile phone subscription rate’s variation, with no additional information.

Diagnostics on this linear model - mobile phone subscription rate

Diagnostics for mob-subs model

Figure 4.3: Diagnostics for mob-subs model

Observations from diagnostic:

These four plots 4.3 are mainly to diagnose if the linear model is suitable and holding the model assumptions. The residual plot shows a pattern and the points are not scattering along the zero line randomly. It suggests either the current model does not hold the linear model assumptions or there is some problems with the model formation (i.e. check the independent variables used)

Plus, from the Q-Q plot, it has a clear deviated top end tail, which indicates the theoretical prediction is diverging from the sample estimates and results in inaccurate prediction result/conclusion. Lastly, the Cook’s D and Leverage plot show there are multiple extreme values in the sample data, which may be the reason that constructs a spurious linear relationship between variables (i.e. the findings in EDA-1, 3.3).

Figure 4.4: Fitting linear regression for telephone subsripitions in each continent

Figure 4.4 illustrates the “gdp_poer_cap” variable may not be a major factor that can help to explain the variation in the landline subscription rate. Only two out of five continents are appearing in a seeming linear relationship. Linking back to EDA - 1, it confirms that in some of the continents, the associations between variables and landline subscription rate may not appear as linear. This suggests the need for additional information to obtain a better regression model.

Meanwhile, the other three variables provided in the data set are also important factors in explaining the variation of landline subscriptions. Thus, here has constructed a linear regression model for land-line subscription:

\[\widehat{mobile-subs} = 11.408 + 0.0006~gdp-per-cap + 0.139~mobile_subs - 0.6326~year1990\]

(Note: Variable “total_pop”’s effect in this model is too small to show, hence hidden for presentation.)

## # A tibble: 5 x 5
##   term        estimate     std.error statistic   p.value
##   <chr>          <dbl>         <dbl>     <dbl>     <dbl>
## 1 (Intercept)  1.14e+1 0.509             22.4  4.95e-105
## 2 gdp_per_cap  5.68e-4 0.0000132         43.1  0        
## 3 total_pop    1.86e-9 0.00000000162      1.15 2.51e-  1
## 4 mobile_subs  1.39e-1 0.00792           17.6  1.22e- 66
## 5 year1990    -6.33e-1 0.0493           -12.8  5.05e- 37
## # A tibble: 1 x 12
##   r.squared adj.r.squared sigma statistic p.value    df  logLik    AIC    BIC
##       <dbl>         <dbl> <dbl>     <dbl>   <dbl> <dbl>   <dbl>  <dbl>  <dbl>
## 1     0.501         0.500  13.6     1055.       0     4 -16963. 33937. 33975.
## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

Above is the statistical information summary for this linear regression model. The value of R square for this linear model is 0.5006, which suggests that around 50.06% of the variation in mobile phone subscription can be explained using variables “total_pop,” “gdp_per_cap,” “year 1990” and “mobile_subs.”

This linear model contains all the significant variables to explain the variation of landline subscription, yet the statistics suggest these may not be the suitable ones (a low R square value). Moreover, these estimate coefficients of each variable in the model are very small. Although, the p-value suggests they all are the important factors yet the effect and help on explaining and predicting the changes in landline subscription is limited or minor. Hence, understanding the variation of landline subscription rate requires additional relevant information and for constructing a possible prediction model as well.

Diagnostics on this linear model - land-line subscription rate

Diagnostics for landline-subs model

Figure 4.5: Diagnostics for landline-subs model

Observations from diagnostic:

Similarly, it’s necessary to check if the linear model is a suitable model for landline subscription rates. Figure 4.5 contains the residual plot which shows a pattern of scattering and the points are not scattering along the zero line randomly. This suggests either the current model does not hold the linear model assumptions or there are some problems with the model formation (i.e. check the independent variables used).

From the Q-Q plot, it has a clear deviated top and bottom tail, which indicates the theoretical prediction is diverging from the sample estimates and results in inaccurate prediction. In the Cook’s D and Leverage plot, they show there are multiple extreme values in the sample data that are more than the last model diagnostic (in 4.3 ).

The non-parametric regression method - use of LOESS.

After the attempt of fitting linear regressions for both mobile phone subscription and land-line subscription, it suggests this approach may not be the best way in terms of understanding the variation of both target variables. Thus, the non-parametric approach may shine some light on the issue.

From observations in previous statistical information, it suggests to fit a non-parametric regression for both mobile phone subscription and land-line subscription. To obtain a comparison with previous ones (i.e. 4.2 and 4.4), for mobile phone subscription, it will use the varaible “year1990” where for land-line subscription it will use variable “gdp_per_cap.”

Fitting a non-parametric regression for mobile subscription by continents

Figure 4.6: Fitting a non-parametric regression for mobile subscription by continents

Fitting a non-parametric regression for mobile subscription

Figure 4.7: Fitting a non-parametric regression for mobile subscription

Figure 4.6 provides a comparison to figure 4.2. It shows Africa and Asia share a similar fitting regression, where Europe and Americas (both North and South) share a similar regression pattern, yet Oceania is the odd one that has a special regression pattern.

In figure 4.7, it illustrates the overall regression line, which is not a straight line yet providing some insights for the later model formation and data analysis.

Fitting a non-parametric regression for land-line subscription by continents

Figure 4.8: Fitting a non-parametric regression for land-line subscription by continents

Fitting a non-parametric regression for land-line subscription

Figure 4.9: Fitting a non-parametric regression for land-line subscription

Figure 4.8 provides a comparison to figure 4.4. It shows all continents are having a special regression pattern, which may indicate the subscription rate of the landline is not predictable by using the information in the given data sets.

Even in figure 4.9, it doesn’t show an acceptable good fit regression line as a whole. This further support the information given is not enough to construct a reliable model to explain the variation for land-line subscription rates.

4.3 Summary of findings

  • For new research question 2:
  1. For both mobile subscription and landline subscription, the linear regression model is likely not the suitable model to use for explaining the variation of both variables in this case.
  2. The previous testing in EDA-1, section 3.3 has been proved incorrect with evidences in the statistical diagnostic results in this section.
  3. Attempt for fitting a non-parametric regression provides certain hints for further model formation in mobile subscription, yet requiring additional relevant information for landline subscription’s model formation in further analysis.

5 Conclusion

6 Acknowledge

  • The following packages are used to produce this report: naniar (Tierney et al. 2020), dplyr (Wickham et al. 2021), readr(Wickham and Hester 2020), tidyverse (Wickham 2021), rgdal (Bivand, Keitt, and Rowlingson 2021), knitr (Xie 2021b), leaflet (Cheng, Karambelkar, and Xie 2021), ColorBrewer (Neuwirth 2014), ggplot2 (Wickham 2016), gganimate (Pedersen and Robinson 2020), gifski (Ooms 2021), plotly (Sievert 2020), bookdown(Xie 2021a), kableExtra (Zhu 2021), ggResidpane (Goode and Rey 2019), broom (Robinson, Hayes, and Couch 2021), janitor (Firke 2021), gridExtra (Auguie 2017).

  • The background map information came from Bjorn Sandvik (2021).

References

Auguie, Baptiste. 2017. gridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.
Bivand, Roger, Tim Keitt, and Barry Rowlingson. 2021. Rgdal: Bindings for the ’Geospatial’ Data Abstraction Library. https://CRAN.R-project.org/package=rgdal.
Bjorn Sandvik. 2021. “World Borders Dataset.” http://thematicmapping.org/downloads/world_borders.php.
“Cell Phone Users Are Giving up Their Landline Service.” 2006. Research Alert. Whitaker & Company, Publishers, Inc.
Cheng, Joe, Bhaskar Karambelkar, and Yihui Xie. 2021. Leaflet: Create Interactive Web Maps with the JavaScript ’Leaflet’ Library. https://CRAN.R-project.org/package=leaflet.
Firke, Sam. 2021. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.
Goode, Katherine, and Kathleen Rey. 2019. ggResidpanel: Panels and Interactive Versions of Diagnostic Plots Using ’Ggplot2’. https://CRAN.R-project.org/package=ggResidpanel.
ITU. 2021. “World Telecommunication/ICT Indicators Database.” https://www.itu.int/en/ITU-D/Statistics/Pages/publications/wtid.aspx.
Laura Silver. 2021. “Smartphone Ownership Is Growing Rapidly Around the World, but Not Always Equally.” https://www.pewresearch.org/global/2019/02/05/smartphone-ownership-is-growing-rapidly-around-the-world-but-not-always-equally/.
Neuwirth, Erich. 2014. RColorBrewer: ColorBrewer Palettes. https://CRAN.R-project.org/package=RColorBrewer.
Ooms, Jeroen. 2021. Gifski: Highest Quality GIF Encoder. https://CRAN.R-project.org/package=gifski.
Pedersen, Thomas Lin, and David Robinson. 2020. Gganimate: A Grammar of Animated Graphics. https://CRAN.R-project.org/package=gganimate.
Ritchie, Hannah, and Max Roser. 2017. “Technology Adoption.” Our World in Data.
Robinson, David, Alex Hayes, and Simon Couch. 2021. Broom: Convert Statistical Objects into Tidy Tibbles. https://CRAN.R-project.org/package=broom.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with r, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.
Tidy Tuesday. 2021. “A Weekly Social Data Project in r.” https://github.com/rfordatascience/tidytuesday.
Tierney, Nicholas, Di Cook, Miles McBain, and Colin Fay. 2020. Naniar: Data Structures, Summaries, and Visualisations for Missing Data. https://github.com/njtierney/naniar.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2021. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2021. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Jim Hester. 2020. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Worldbank. 2021. “World Development Indicators.” https://datacatalog.worldbank.org/dataset/world-development-indicators.
Xie, Yihui. 2021a. Bookdown: Authoring Books and Technical Documents with r Markdown. https://CRAN.R-project.org/package=bookdown.
———. 2021b. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Zhu, Hao. 2021. kableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.